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This paper highlights measurement issues faced when 
attempting to assess and interpret results of a school improvement 
project. Based on the assumption that to measure effectiveness, one 
must measure a wide \^ariety of school factors, the paper presents a 
broad perspective on measurement problems and dilemmas in analyzing 
norm-referenced test data and data obtained through interview, self 
appraisal, and observation concerning 117 elementary schools. Trend 
analysis, two forms of residual gains analyses, traditional ranking, 
and expert judgment methods are compared. Data suggest that school 
level residual analysis appears to provide th.e best approach to 
selecting schools. The individual level residual ^scores yield a list ' 
which overlaps with the school level approach. Trend analysis is the 
most conservative and yields the fewest schools (which are also 
identified by residual score analyses). Expert opinion does not 
correlate positively with residual or trend analyses. Analyses 
indicated few consistencies over time. The authors conclude with two 
alternatives—either schools are not/ consistent in their impacts from 
year to year or their metric is susjp6ct. (Author/CM) 
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INTRODUCTION 

I do not tMnk that I would be overstating the cdse if I said that research on 
effective schools is the current rpging fad. There is a belief that we have 
mide a breakthrough in determining what makes for an effective school and even 
a hope that we have found the key for making Uee effective schools into more 
.effective ones. Although most of the studies on school tf fectiveneas hj»ve 
been conducted in urban, low-scoring schools, higher achieving schools ahd 
school districts could -hardly be considered disinterested; and concern wlth\ 
assessing effectiveness is evident in every school district and stateX 
department of education. 

V * • • ■ 

In Montgomery County, Maryland, where Jim and I work, there is also 
considerable interest in measuring school effectiveness. This interest is, 
however, mixed with a good deal of concern about whether or not wereally know 
how to measure effectiveness and whether the traditional tools used really 
tell us what we think they do. Typically, educators have used test scores to 
decide whether or not a school is effective. For a long time, it was believed 
that the higher the test score, the more effective the school. T think we've 
made some progress in moving away from this simplistic and probably -erroneous 
indicator of effectiveness and realize that a single test score tells us as 
much or more about a student's background, as the effectiveness of the 
instruction he or she has received. We have developed a number of 
alternatives to "this approach controlling in many ways for "extra-school" 
factors. But, at this point, we can say only that the alternatives we have 
come' up with are more sophisticated, but not necessarily more accurate. 

I take this pessimistic viewpoint because of some work that we haye been dqing 
in Mpntgomery County on methodological issues in determining school 
effectiveness comparing selected "effective" schools. Last year, several 
members of my staff rated schools using different analytical approaches. We 
then compared their results looking for convergence and divergence. In a 
second analysis we looked more closely at the extent to which the same schools 
appeared to be effective or ineffective over time. Today, I will briefly 
discuss the results of each of these analyses. 

COMPARISONS AMONG METHODS 

First, I will discuss the comparison of methods— five methods were examined; 
trend analysis, two forms of residual gaind analyses, traditional ranking, and 
expert judgment. Jim Myerberg has already described one of the methods; trend 
analysis (performance of a matched longitudinal sample using the 8 NCE 
criterion). 

The second and third used residual gain scores, with the unit of analysis 
being either the individual student or student data aggregated to the school 
level. The fourth approach, what I will label as "traditional ranking- 
involved ranking schools according to fifth grade . test scores. Finally, a 
form of "expert judgment" was used in which reading specialists Vece asked to 
assess the degree to which schools were effective or ineffective In teacl\lng 
reading. 

« 

In comparing the results of these different approaches we tried to keep as 
many things besides method as constant as possible. For example, we made* sure 
that we used the same subscore on the Iowa Tests of Basic Skills as our 
indicator of effectiveness. This was the Reading Comprehension score. We 
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also made sure we were looking at test scores from the same cohort of students. 
While these controls sound so obvious as to be trivial, comparative studies in 
the past have not always taken these precautions either through oversight or 
because for some reason it has proven impossible. We also used the same 
criterion . for determining outliers—those which we were going to call 
effective: and " ineffective. That is, we standardized the test scores and 
took as our outliers those schools with a "Z" score of -1.38. This criterion 
was admittedly somewhat arbitrary. It was selected because it gave us what 
might be considered a "face valid" nuirter of outliers— about 10 percent of our 
elementary schools at either tail. We might discuss sometime whether it was, 
in fact, an appropriate choice. A^ least we can say it was consistent. 

Where expert opinion was used, we tried to exert some control by asking our 
experts to focus on effectiveness in the area of reading. Other aspects of 
this method clearly differed, however, from the other four and no attempt was 
made to control for them. For example, it -is likely that the expert Judgments 
took into account the overall performance of tlie school in reading (rather than 
focusing on 5th grade performance) and included an assessment of the schools' 
performance ov*r more than a single year. 

Effective Schools 
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Exhibit 1 shows the schools selected as "effective" by each of the methods 
employed. This exhibit ^hows: , 

o Overall 4 7 of the 117 (40Z) elementary schools examined were nominated 
by one or more of the metjipds ^ 

o Only 11 (9Z) were nominated by more than one method 

o The methods differed widely in the number of schools identified as 

efiPctive", from a low of "3" for the trend analysis to a high of 27 
for the Expert method. 

Further analyses examined the correlations between the individual methods of. 
selecting effective schools CPhi). In calculating these correlations we 
decided to treat effectiveness as a dichotomous rather than a continuous 
variable. -That is, we looked only at the degree to which the alternative 
methods resulted in the nomination of the same schools as effective. Clearly, 
a viable alternative would haye been to consider the entire continuum of 
schools. Exhibit 2 presents the findings. 

Exhibit 2 shows that all the methods are significantly correlated exce{)t the 
cross-sectional racking method and individual level residual score analysis. 
The strongest positive relationship (.38) was found between the trend analysis 
and the individual level residual analysis. The strongest negative 
relationship (-.49) was found between expert Judgment and the individual 
residual score analysis. Expert Judgment was, however, significantly 
negatively delated to all three of the Indices considered. We were somewhat 
surprised to note the low correlations between the cross-sectional method and 
the re8„idual score analyses. In previous studies, this relationship was found 
to be stronger. 



Exhibit 1 



Schools Selected as Effective Using Each of the Five Methods 

N-117 



School # 



Expert School ' Cross- Individual 
Trend Opinions Level Sectional Level 
Analysis Residuals Tlesiduals 



Total 



2 
4 
7 
8 

10 
11 
12 

14 
16 
17 

20 

26 

27 

28 

30 

33 

34 

41 

43 

45 

47 

48 

54 

60 

61 

62 

63 

68 

69 

77 

78 

81 

86 

91 

98 

100 

103 

105 

106 

113 

116 

119 

122 

123 

124 

127 

Totals 



X 
X 
X 



X 
X 

X 
X 
X 
X 
X 

X 
X 



X 
X 
X 



X 
X 
X 



X 
X 



X 
X 
X 
X 



27 



X 
X 
X 



X 
X 



X 
X 



X 

X 

X 
X 
X 



n 8 



X 
X 
X 



X 
X 



X 
X 
X 
X 
X 



X 
22 



47 



Exhibit 2 

Correlations Between the Alternative Methods for Selecting Effective Schools 



Method 



A 



B 



C 



D 



E 



A 
B 

C 
D 

E 



.13* 



+.38** 
-.12* 



+.11* 
- . 18** 
+ .13* 



+.28** 
-.49** 
+.33** 
+ .03 



*P .05 
**P .01 

^A - Trend Analysis 
B - Expert Opinion 
C - School Level Residual Scores 
D - Cross Sectional 
E - Individual Level \e8idual Scores 
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Finally, we looked at the degree to which each of- the approaches correlated 
with a composite effectiveness score. The latter was determined by summing 
the number of nominations and dividing the schools into .two groups: thos.e 
nominated once (N-26) and those nominated twice or more (N-11). Exhibit 3 
presents the results of the Phi analysis. 

The data suggest that the strongest correlation ill between the school level 
residual analysis and the composite score (.62). Nearly as strong was the 
relationship ^between the composite scor* and Individual residual score, (.59). 
The trend analysis and composite score also showed. a strong r'elationship 
(.47). The other two methods showed considerable weaker correlations 
although they remained significant. ' 
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Exhibit 3 

Correlation Between Each Method and The Composite Effectiveness Score^ 



Composite Effectiveness Score 



a2 


+.47** 


B 


-.07 


C 


+.62** 


D 


+. 28** 


E 


+ .59* 


\ 





**P .01 

^The outcome measure "Composite Effectiveness Scorp" Is defined by the total 
number of nominations received by each school. We have chosen to divide the 
nominal schools Into 2 categories: those receiving 1 nomination and those 
receiving 2 ot more. 

2 

A - Trend Analysis 

B - Expert Opinion 

C - School Level Residual Scores 

D - Cross Sectional 

E - Individual Level Residual Scores 



Ineffective Schools . 

Exhibit 4 presents the findings for schools Judged to be "ineffective." It 
should be noted that expert opinion was not used as a means for selecting 
ineffective schools because it was deemed preferable -not to ask the 
specialists to single out schools as being particularly ineffective in 
teaching reading. The exhibit shows: 

o Overall 30 of the 117 (26%) Elementary schools were nominated by one 
or more of the methods. - 

O Only 7 (6Z) were nominated by more than one method. 

o The methods differed in the number of schools identified as 
ineffective. The trend analysis yielded only one school. The 
Individual residual score method yielded the largest number - 23. 

It is interesting to note that there are four schools which appear on both 
lists — the effective schools and the ineffective schools. In all four 
-cases, the schools had been nominated as effective by expect ludgment. 

Correlations among the measures are presented in Exhibit 5. This matrix is 
some^liat spotty because of the absence of expert opinion and the elimination 
of th«- trend analysis, since it yielded only one case. It is, however, worth 
point IrtR ^out a couple of findings. The school level residual analysis appears 
to be %ncorrelated with either the cross-sectional or individual level 
residual "^alysis—a finding which is not consistent with relationships among 
the method^; for selecting of "effective schools". Interestingly the 
cross-sectiona^ and individual residual analyses are strongly negatively 
correlated. o ^ o j 

Exhibit 6 presents j:he correlation between each of the iaeasures and the 
composite ineffective scores. As with the analysis of effective schools, the 
highest correlation is between the composite score and the school level 
residual analysis. The cross sectional method is unrelated to the composite 
score. 



Exhibit 4 

Schools Selected as Ineffective Ds^ng Each of the Five Methods 

N-117 



School # 



Expert 
Trend Opinions 
Analysis 



School Cross- Individual 
Levpl Sectional Level 
Residuals Residuals 



Total 



3 
12 
15 
22 
29 
30 
35 
42 
51 
57 
61 
63. 
65 
67 
70 
72 
75 
79 
80 
81 
84 
85 
89 
93 
108 
110 
114 
124 
126 
115 

Totals 



X 
X 



X 
X 

X 
X 



X 
X 



X 

X 



X 
X 

X 

X 



X 
X 
X 
X 

X 
X 
X 
X 
X 
X 

X 
X 
X 
X 
X 
X 
X 
X 

23 



1^ 



30 



10 
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Exhibit 5 

Correlation Between the Alternative Methods of Selecting Ineffective Schools ' / 

N-30 / 



Method^ 



B 

C . • .04 -.12 

^ -.50** 
E 



**P .01 

- Trend Analysis 
B - Expert Opinion 
C - School Level Residual Scores 
D - Cross Sectional 

E - Individual Level Residual Scores 

2 

Since the trend analypis yielded only one case we have eliminated it, 
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Exhibit 6 



0 



Correlations Between Each Method and The Compoajlte Inef f ectfveness Score 



1 



Composite Inef f ectivenesa Score 



A 
B 
C 
D 
E 



+.51** 

+.03 

+.30** 



**P .01 

^Thls score is derived by summing the number of nominations received fojr 
each school. The resulting group is then divided into two! groups: those 
receiving 1 nomination and those receiving 2 or more., 1 

2 

A - Trend Analysis 

B - Expert Opinion 

C " School Level Residual Scores 

D - Cross Sectional 

E - Individual Level Residual -Scores 



\ 
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Conclusions , 

Of the methods examine*^ here Che school level residual analysis appears to 
provide the best approach, to selecting' schools. The individual level 
reslrfual score yields a list which to a large extent overlaps xrtth that 
produced by ^the school level approach but also contains a number of others. 
It has been suggested* that this fs because it fails to account adequately for 
error variance in the individual scores. - „ 

> The trend analysis is the most conservative, yielding the fewest schools. 
Nonetheless the schools identified by this pethod are. also identified by the 
residual score analyses. It mSy-wH be that it is only a matter of criterion 
that separates the trend and school' level approaches from converging more 
completely. We explored this by doing a supplementary analysis where, inste'ad 
of the. 8 NCE criterion, we used the 1.38 Z criterion that had been employed in 
the residual gains methods - using^ this standard the trend analysis approach 
naturally yielded^more schools. Two of these, however, Ve re not identified by 
the residual score analysis and may be. cases of regressioh , to the mean. We 
need to further explore this ^ethod, and the strengths^ and weaknesses of using 
it with different criteria for school selection. While it may be lacking in a 
certain amount of elegance, jihe approach has a considerable amount of appeal 
because it is so easy to understand ^and apply.. 

"^^^ - Tl5i.4f role of the other two Analyses appears to be far less clear. The data 
certainly suggest that th# cross' sectional approach can be misleading. And^ 
this study supports others which have sugge3ted that it is better to avoid 
making judgments regarding school effectiveness on sucli data. 

The -usefulness of expert opinion remains a question. As this e^tudy shows,, 
expert opinion does not correlate positively w^th the residual or trend 
analyses explored here. On the other ^hand, as we shall see in the tiext 
section, these methods do not correlate well with each other from year to 
year — a rather disturbing finding for ^those trying to identify effective 
schools! The possibility should not, therefore, be dismissed that expert 
opinion is providing useful information anjd l^a lack of correlation with the 
bthe;: . methods dbes not necessarily mean Chat expert opinion can or should be 
dismissed. •/ N 

CONSISTENCY ACROSS YE^RS 

The second series of analyses looked at consistency across ye^rs. If a school^ 
really is effective (or ineffective) , it would be expected that analyses woul*- 
shbw rhe school to be an outlier with some ^onsis^tencyT We, therefore, looked 
at consistency over time using both the NCE trend analysis and the residual 
gains methods. Exhibit 7 shows the findings from the trend analysis, 
considering cohorts^ The bl^ck boxes show scores that are high, the shaded 
boxea those that are low. Fe\& consistencies over time are found. Further, 
when correlations were computed by John Lar&on of my staff for two cohorts 
Rising- residual gains analyses, the findings were similar. Across the two 
^horts, the correlation is only- ,24 for reading scores and .32 fox 
mathematics. If we consider these data in terms' of variance explained, we 
come up., with a whooping four to nine percent. ' ^ 

We are left with two alternatives — we can cqnclude either that schools are not 
^ consistent in,. their impacts from year to year or that our metric is suspect, 
af not faulty. Intuitively, we have t^ suspect the metric. Unfortunately, we 



EXHIBIT 7 

Schools With Substantial Longitudinal Trends in 
E^ch\^of the Last Four Years - First Quarter 
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JHF - School longitudinal trend was at least 8 NCE points higher than the coutity trend. 
AWWIW - School longitudinal trend was at least 8 NCE points lower than the county trend. 



No. - Number .Tested 
TL - Total Language 
O Composite 
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RC - Reading Comprehension 
TM - Total Math 
TB - Total Battery 
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EXHIBIT 7 (Continued) 
Schools With Substantial Longitudinal Trends in 
Each of the Last Four Years - Second Quarter 



No. 
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- School longitudinal trend was at least 8 NCE points higher than the county trend. 

- School -longitudinal trend was at least 8 NCE points lower than the county trend. 



No. - Number, Tested 
TL - Total Language 
C - Coniposi\f6 



RC - Reading Comprehension 
TM - Total Math 
TB - Total Battery 



^ EXHIBIT 7 (Continued) 
Schools With Substantial Longitudinal Trends In 
Each of the' Last Four Years - Third Quarter 
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- School longitudinal trend was aC least 8^NCE points higher than the county trend, 
1 - School longitudinal trend was at least 8 NCE points lower than the county trend. 



No. - Number Tested 
TL ^ Total Language 
C - Composite 



RC - Reading Comprehension 
TM - Total Math 
TB - Tota^ Battery 
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EXHIBIT 7 (Continued) 
Schools With Subatantial Longitudinal Trendi In 
Each of the Last Four Years - Fourth Quarter 
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27 



23 



23 



78 



38 



16 



59 



37 



39 



52 



43 



58 



53 



47 




1978-79 
RCITL TM C 



No 



34 



16 



19 



76 



17 



11 



27 



45 



47 



47 



43 



51 



44 



16 



50 



16 



20 



27 



71 



39 



27 



67 



28 



45 



41 



43 



51 



60 



46 



1979-80 
RC TL TM C 




No 



28 



19 



66 



28 



19 



31 



50 



39 



31 



34 



77 



31 



1980-81 
RC TL TM TB 



11 



47 



21 



13 



34 



90 



32 



23 



53 



32 



51 



52 



40 



39 



63 



57 




No, 



27 



1981-82 
RCI TLITMI TB 



11 



65 



31 



17 



37 



50 



48 



35 



46 



65 



39 



18 



46 



25 



17 



31 



70 



29 



25 



59 



34 



38 



39 



35 



45 



44 



66 



M- School longitudlTjal trend was at least 8 NCE points higher than the county trend. 
- School longitudinal tr^nd was at least 8 NCE points lower than the county trend. 



No. - Numbeir Tested 
TL - Total Language 
C - Composite' 



RC - Reading Comprehension 
TM Total Math 
TB - Total Battery 



have not found a more satisfactory substitute. In any case, these findings 
are of great concern to us in our jobs and, in addition, make ua regard 
existing research an school effectiveness with some degree of skepticism. 
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ABSTRACT 



THE MEASUREMENT OF EFFECTIVENESS 
Some Methodological Problems 

By Dr. Joy A. Frechtllng 



This paper highlights measurement issues faced when attempting to assess and 
interpret results of a school improvement project. Based on the assumption 
that to meaaure effectiveness » one must measure a wide variety of school 
factors » the paper presents a broad perspective on measurement problems and 
dilemmas in analyzing norm-referenced test data and data obtained through 
interview, self apprais'al» and observation 

The central quest-ion is the following: How does, one move from correlation 
analysis to assessment of change, when measuring instruments as well as 
methodology are far from satisfactory? 
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